24 research outputs found
Recommended from our members
On Multicast in Asynchronous Networks-on-Chip: Techniques, Architectures, and FPGA Implementation
In this era of exascale computing, conventional synchronous design techniques are facing unprecedented challenges. The consumer electronics market is replete with many-core systems in the range of 16 cores to thousands of cores on chip, integrating multi-billion transistors. However, with this ever increasing complexity, the traditional design approaches are facing key issues such as increasing chip power, process variability, aging, thermal problems, and scalability. An alternative paradigm that has gained significant interest in the last decade is asynchronous design. Asynchronous designs have several potential advantages: they are naturally energy proportional, burning power only when active, do not require complex clock distribution, are robust to different forms of variability, and provide ease of composability for heterogeneous platforms. Networks-on-chip (NoCs) is an interconnect paradigm that has been introduced to deal with the ever-increasing system complexity. NoCs provide a distributed, scalable, and efficient interconnect solution for today’s many-core systems. Moreover, NoCs are a natural match with asynchronous design techniques, as they separate communication infrastructure and timing from the computational elements. To this end, globally-asynchronous locally-synchronous (GALS) systems that interconnect multiple processing cores, operating at different clock speeds, using an asynchronous NoC, have gained significant interest. While asynchronous NoCs have several advantages, they also face a key challenge of supporting new types of traffic patterns. Once such pattern is multicast communication, where a source sends packets to arbitrary number of destinations. Multicast is not only common in parallel computing, such as for cache coherency, but also for emerging areas such as neuromorphic computing. This important capability has been largely missing from asynchronous NoCs. This thesis introduces several efficient multicast solutions for these interconnects. In particular, techniques, and network architectures are introduced to support high-performance and low-power multicast. Two leading network topologies are the focus: a variant mesh-of-trees (MoT) and a 2D mesh. In addition, for a more realistic implementation and analysis, as well as significantly advancing the field of asynchronous NoCs, this thesis also targets synthesis of these NoCs on commercial FPGAs. While there has been significant advances in FPGA technologies, there has been only limited research on implementing asynchronous NoCs on FPGAs. To this end, a systematic computeraided design (CAD) methodology has been introduced to efficiently and safely map asynchronous NoCs on FPGAs. Overall, this thesis makes the following three contributions. The first contribution is a multicast solution for a variant MoT network topology. This topology consists of simple low-radix switches, and has been used in high-performance computing platforms. A novel local speculation technique is introduced, where a subset of the network’s switches are speculative that always broadcast every packet. These switches are very simple and have high performance. Speculative switches are surrounded by non-speculative ones that route packets based on their destinations and also throttle any redundant copies created by the former. This hybrid network architecture achieved significant performance and power benefits over other multicast approaches. The second contribution is a multicast solution for a 2D-mesh topology, which is more complex with higher-radix switches and also is more commonly used. A novel continuous-time replication strategy is introduced to optimize the critical multi-way forking operation of a multicast transmission. In this technique, a multicast packet is first stored in an input port of a switch, from where it is sent through distinct output ports towards different destinations concurrently, at each output’s own rate and in continuous time. This strategy is shown to have significant latency and energy benefits over an approach that performs multicast using multiple distinct serial unicasts to each destination. Finally, a systematic CAD methodology is introduced to synthesize asynchronous NoCs on commercial FPGAs. A two-fold goal is targeted: correctness and high performance. For ease of implementation, only existing FPGA synthesis tools are used. Moreover, since asynchronous NoCs involve special asynchronous components, a comprehensive guide is introduced to map these elements correctly and efficiently. Two asynchronous NoC switches are synthesized using the proposed approach on a leading Xilinx FPGA in 28 nm: one that only handles unicast, and the other that also supports multicast. Both showed significant energy benefits with some performance gains over a state-of-the-art synchronous switch
Aging-Aware Routing Algorithms for Network-on-Chips
Network-on-Chip (NoC) architectures have emerged as a better replacement of the traditional bus-based communication in the many-core era. However, continuous technology scaling has made aging mechanisms, such as Negative Bias Temperature Instability (NBTI) and electromigration, primary concerns in NoC design. In this work, a novel system-level aging model is proposed to model the effects of aging in NoCs, caused due to (a) asymmetric communication patterns between the network nodes, and (b) runtime traffic variations due to routing policies. This work observes a critical need of a holistic aging analysis, which when combined with power-performance optimization, poses a multi-objective design challenge. To solve this problem, two different aging-aware routing algorithms are proposed: (a) congestion-oblivious Mixed Integer Linear Programming (MILP)-based routing algorithm, and (b) congestion-aware adaptive routing algorithm and router micro-architecture. After extensive experimental evaluations, proposed routing algorithms reduce aging-induced power-performance overheads while also improving the system robustness
Unbalanced omega ratio and omega 3 deficiencies in world makes our immune system less effective to fight with virus and other infections
According to the report of a global survey of the omega-3 fatty acids. majorities of countries in the world are facing the deficiency of essential fatty acids specially of omega 3, this very low level of essential fatty acid leads to increase global risk for chronic disease. Many reports are published about the role of omega 3 on the immune system in health and in diseases, especially those caused by the excessive inflammatory response. Numerous studies have shown that these compounds are immunoregulatory and immunosuppressive and thus may increase susceptibility to infection. They also manipulate the functions of antigen-presenting cells and lymphocytes, including T and B cells, NK cells, LAK cells and also T regulatory cells. In this article, we made a simple attempt to elucidate the effect of omega-3 deficiency in our immune system, especially during the virus and other infections. In this period of severe virus infections studies on omega3 and its role in immune is of great Interest
Real-Time Fully Unsupervised Domain Adaptation for Lane Detection in Autonomous Driving
While deep neural networks are being utilized heavily for autonomous driving,
they need to be adapted to new unseen environmental conditions for which they
were not trained. We focus on a safety critical application of lane detection,
and propose a lightweight, fully unsupervised, real-time adaptation approach
that only adapts the batch-normalization parameters of the model. We
demonstrate that our technique can perform inference, followed by on-device
adaptation, under a tight constraint of 30 FPS on Nvidia Jetson Orin. It shows
similar accuracy (avg. of 92.19%) as a state-of-the-art semi-supervised
adaptation algorithm but which does not support real-time adaptation.Comment: Accepted in 2023 Design, Automation & Test in Europe Conference (DATE
2023) - Late Breaking Result
Inter arm systolic blood pressure difference is associated with a high prevalence of cardio vascular diseases
Background: Blood pressure (BP) recordings often differ between arms. This study is aimed to observe the presence of inter-arm blood pressure difference and association with hypertension or diabetes. The objective of the study was to establish the prevalence of an inter-arm blood pressure difference and explore its association with obesity and cardiovascular disorder.Methods: A cross-sectional study conducted at King George’s Medical College, Lucknow, India among 100 first year MBBS students. After taking verbal consent the age, height, weight, waist circumference, hip circumference and family history of hypertension or diabetes were recorded.Results: The systolic blood pressure on right arm was 118.8±11.5 mmHg and 11.7±7.72 mmHg left arm. Result significantly showed higher mean systolic blood pressure on right arm. There were 54, 17 and 29 participants with inter-arm systolic blood pressure difference of 30). Out of 100 subjects, 11 subject having inter-arm systolic blood pressure difference ≥10 mmHg was associated with a family history of diabetes or hypertension.Conclusions: Presence of inter-arm blood pressure difference with having family history of hypertension or diabetes is more susceptible to develop cardiovascular disorder in future
SMAUG: End-to-End Full-Stack Simulation Infrastructure for Deep Learning Workloads
In recent years, there has been tremendous advances in hardware acceleration
of deep neural networks. However, most of the research has focused on
optimizing accelerator microarchitecture for higher performance and energy
efficiency on a per-layer basis. We find that for overall single-batch
inference latency, the accelerator may only make up 25-40%, with the rest spent
on data movement and in the deep learning software framework. Thus far, it has
been very difficult to study end-to-end DNN performance during early stage
design (before RTL is available) because there are no existing DNN frameworks
that support end-to-end simulation with easy custom hardware accelerator
integration. To address this gap in research infrastructure, we present SMAUG,
the first DNN framework that is purpose-built for simulation of end-to-end deep
learning applications. SMAUG offers researchers a wide range of capabilities
for evaluating DNN workloads, from diverse network topologies to easy
accelerator modeling and SoC integration. To demonstrate the power and value of
SMAUG, we present case studies that show how we can optimize overall
performance and energy efficiency for up to 1.8-5x speedup over a baseline
system, without changing any part of the accelerator microarchitecture, as well
as show how SMAUG can tune an SoC for a camera-powered deep learning pipeline.Comment: 14 pages, 20 figure
Large expert-curated database for benchmarking document similarity detection in biomedical literature search
Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe
Power-Performance Yield Optimization for MPSoCs Using MILP
In nanometer technology regime, process variation (PV) causes uncertainties in the processor frequency and leakage power, affecting the overall performance and energy efficiency of Multi-Processor System-on-Chips (MPSoCs). Mostly, the Power and Performance Yield optimizations are not done simultaneously while scheduling the tasks at the system level. We demonstrate the significance of optimizing both Power and Performance Yields simultaneously in task scheduling in order to minimize the effects of process variation at the system level. In this paper, we present process variation aware task scheduling algorithms and define a new design metric, called Power-Performance Yield (PPY) to guide the scheduling procedure. The PPY is modeled considering the spatial correlation characteristic of systematic process variation, log-normal distributions of leakage power and an energy-aware slack budgeting approach. We propose a novel mathematical formulation using Mixed Integer Linear Programming (MILP) technique and also employ an improved Simulated Annealing (SA) based stochastic technique for PPY optimization. The experimental results on TGFF generated random task graphs and E3S benchmark suite demonstrate average PPY improvements of 16.9% and 31% over two other SA based schemes that separately optimize Performance Yield and Power Yield, respectively. With accurate PV-aware modeling, we obtain average PPY improvements of 9.65% and 30.3% under strong correlations and 12.9% and 29.8% under weak correlations when compared to two other existing scheduling schemes that lack appropriate modeling. © 2012 IEEE
Towards Graceful Aging Degradation in NoCs Through an Adaptive Routing Algorithm
Continuous technology scaling has made aging mechanisms such as Negative Bias Temperature Instability (NBTI) and electromigration primary concerns in Network-on-Chip (NoC) designs. In this paper, we model the effects of these aging mechanisms on NoC components such as routers and links using a novel reliability metric called Traffic Threshold per Epoch (TTpE). We observe a critical need of a robust aging-aware routing algorithm that not only reduces power-performance overheads caused due to aging degradation but also minimizes the stress experienced by heavily utilized routers and links. To solve this problem, we propose an aging-aware adaptive routing algorithm and a router microarchitecture that routes the packets along the paths which are both least congested and experience minimum aging stress. After an extensive experimental analysis using real workloads, we observe a 13%, 12.7% average overhead reduction in network latency and Energy-Delay-Product-Per-Flit (EDPPF) and a 10.4% improvement in performance using our aging-aware routing algorithm. © 2012 ACM